課程資訊
課程名稱
分散式系統與雲端應用開發實務
Practices for Distributed Systems and Cloud Application Development 
開課學期
110-2 
授課對象
管理學院  資訊管理學系  
授課教師
莊裕澤 
課號
IM5057 
課程識別碼
725 U3380 
班次
 
學分
3.0 
全/半年
半年 
必/選修
選修 
上課時間
星期四7,8,9(14:20~17:20) 
上課地點
管二103 
備註
限學士班三年級以上
總人數上限:40人 
 
課程簡介影片
 
核心能力關聯
本課程尚未建立核心能力關連
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

由人工智慧、物聯網、大數據、雲端運算等新興科技所帶來的數位化浪潮正在衝擊與改變各種產業,「服務上雲端」也成為數位轉型的關鍵策略。有別於傳統大型應用系統的開發,雲端服務著重在快速部署、靈活與彈性擴充 (elastic & scalable),因此微服務 (microservices)、容器化 (containerization) 等概念逐漸取代了單體式架構 (monolithic)、虛擬化 (virtualization) 概念,成為業界目前開發雲端應用服務的主流。

本課程目標在提供學生分散式系統與雲端應用服務開發所需要的基礎理論知識與實務技能。課程的內容從分散式系統的基本知識開始,包含分散式演算法的設計, logical time, consensus, 容錯,到GFS, Hadoop, Ceph, Bigtables, MapReduce 等大型分散式檔案系統與運算架構,Dynamo 及 IPFS 等基於Distributed Hash Tables (DHTs) 的大型分散式儲存系統,到中介軟體 (middleware)、虛擬化概念 (virtualization),再到Docker containers, Kubernetes, Amazon ECS, Google cloud platforms 等目前雲端應用服務常用的開發、部署、擴充和管理工具。課程亦將邀請業界專家來協助授課,包括工具的使用及分享實務開發的經驗,讓學校的課程可以直接介接到業界的實務需求。

The purpose of the course is to provide students with the fundamental knowledge on the design and implementation of distributed and cloud systems, as well as the practices of popular tools for developing cloud applications. Topics to be covered include distributed algorithms, logical time, consensus, fault tolerance, large distrivuted file and storage systems such as GFS, Hadoop, Ceph, Bigtables, MapReduce; DHT-based large storage systems such as Dynamo and IPFS; and Docker containers, Kubernetes, Amazon ECS, and Google cloud platforms. We will also invite guest speakers from the industry to share their expertises in the field.

加簽表單:https://forms.gle/hB59svUWFkLkpCMW8 

課程目標
提供分散式系統與雲端應用服務開發所需要的基礎理論知識與實務技能 
課程要求
具基本的網路技術知識與Python 程式設計能力

對想選修這門課的同學,要知道自己是否有足夠基礎能修這門課,可以試讀下面二篇論文,如果這論文對你而言太難,那你不適合修這門課(一學期下來我們約有10篇這樣難度的論文要讀):
1. A distributed algorithm for minimum-weight spanning trees, Robert G. Gallager, Pierre A. Humblet, and P. M. Spira, ACM Transactions on Programming Languages and Systems, vol. 5, no. 1, pp. 66–77, Jan. 1983.
2. The hadoop distributed file system: Architecture and design, D. Borthakur, 2007 , or the web page version HDFS Architecture Guide, Apache

註:ACM, IEEE, Elsevier等期刊會議論文資料庫的下載需登錄台大網域,校外可用VPN連線。 
預期每週課後學習時數
 
Office Hours
 
指定閱讀
隨個單元指定相關論文、網路教材與資源 
參考書目
Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 2011. 
評量方式
(僅供參考)
 
No.
項目
百分比
說明
1. 
作業 
20% 
 
2. 
期中考 
30% 
 
3. 
期末考 
20% 
 
4. 
Term project 
30% 
 
 
課程進度
週次
日期
單元主題
第1週
2/17  Introduction: Characteristics of Distributed Systems

Reading:
Ch.1, Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 2011. 
第2週
2/24  Basics of Distributed Systems, Part I: Distributed Algorithm, System Models, Name Services, Synchronization, Coordination, Time and Security

Reading:
1. A distributed algorithm for minimum-weight spanning trees, Robert G. Gallager, Pierre A. Humblet, and P. M. Spira, ACM Transactions on Programming Languages and Systems, vol. 5, no. 1, pp. 66–77, Jan. 1983.
An enhanced version by Guy Flysher and Amir Rubinshtein.
2. Ch.2, 13-15, Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 2011. 
第3週
3/3  Basics of Distributed Systems, Part II:
Transactions Processing and Concurrency Control, Replication & Fault-tolerant services

Reading:
1. Ch.16-17, Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 2011.
2. Ch.18, Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 2011.
3. Design and Analysis of Distributed Algorithms, Chap. 3, Election, Nicola Santoro, 2006. 
第4週
3/10  Guest lecturer: Docker Containers, Stefan Hong, CTO & cofounder, Taiwan AI Labs

Ref.:
1. Docker overview
2. Container and Microservice Driven Design for Cloud Infrastructure DevOps, IEEE IC2E 2016.
3. Containers and Cloud: From LXC to Docker to Kubernetes, IEEE Cloud Computing, Vol. 1-3, Sept. 2014) 
第5週
3/17  Large Distributed File System, Part I:
HDFS - Hadoop Distributed File System (HDFS)
The Google File System

Reading:
1. Ch.12, Distributed Systems: Concepts and Design 5th Ed., C. Coulouris et al., 5th ed., 2011.
2. The hadoop distributed file system: Architecture and design, D. Borthakur, 2007 , or the web page version HDFS Architecture Guide, Apache
3. The Hadoop Distributed File System, K. Shvachko, H. Kuang, S Radia, R. Chansler, IEEE MSST 2010
4. The Google File System, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, ACM SOSP 2003 
第6週
3/24  Large Distributed File System, Part II
Ceph & RADOS

Reading:
1. Ceph: a scalable, high-performance distributed file system, S. A. Weil, et al., OSDI 2006.
2. RADOS: a scalable, reliable storage service for petabyte-scale storage clusters, S. A. Weil, et al., PDSW 2007
3. CRUSH Controlled, Scalable, Decentralized Placement of Replicated Data, S. A. Weil, et al., SC2006

Large Distributed Storage Systems: Google Bigtable

Reading:
1. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services, S. Gilbert & N. Lynch, ACM SIGACT News 2002.06.
2. Bigtable: A Distributed Storage System for Structured Data, F. Chang, et al., ACMTOCS 2008.
References:
1. Overview of Cloud Bigtable, Google. 
第7週
3/31  P2P File Sharing Networks & Distributed Hash Tables (DHTs), Part I:

Reading:
1. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Ian Clarke, et al., Springer 2001.
2. Incentives build robustness in bittorrent. B. Cohen., In
Workshop on Economics of Peer-to-Peer systems, vol. 6, 2003.
3. A scalable content-addressable network, Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker, ACM SIGCOMM Computer Communication Review.

 
第8週
4/7  midterm 
第9週
4/14  P2P File Sharing Networks & Distributed Hash Tables (DHTs), Part II:
Reading:
1. Chord: a scalable peer-to-peer lookup protocol for internet applications, Ion Stoica. et al., IEEE/ACM Transactions on Networking, Vol 11, Issue 1, Feb. 2003.
2. Kademlia: A Peer-to-Peer Information System Based on the XOR Metric, Petar Maymounkov & David Mazières, IPTPS 2002.
3. OceanStore: an architecture for global-scale persistent storage. J. Kubiatowicz, et al., ACM SIGOPS, 2000.
4. S/kademlia: A practicable approach towards secure key-based routing. I. Baumgart and S. Mies. International Conference on Parallel and Distributed Systems, 2007.
5. Sloppy Hashing and Self-Organizing Clusters. M. J. Freedman, E. Freudenthal, and D. Mazieres, IPTPS 2003. 
第10週
4/21  Guest lecturer from Amazon AWS 
第11週
4/28  Large Distributed Storage Systems: Chubby, Amazon Dynamo, Mongo DB, IPFS

Reading:
1. The Chubby lock service for loosely-coupled distributed systems, M. Burrows, OSDI 2006.
2. Dynamo: Amazon’s highly available key-value store, Giuseppe DeCandia, et al., ACM SIGOPS Operating Systems Review, October 2007.
3. IPFS - Content Addressed, Versioned, P2P File System (DRAFT 3), Juan Benet, 2014.
3'. IPFS Concepts & Documents , IPFS. 
第12週
5/5  Guest lecturer

Google Cloud Platform, Google Kubernetes Engine (GKE)

Ref.:
kubernetes 
第13週
5/12  Consensus, part1: Impossibility results, part2: Paxos Algorithm (online, est. 3 hrs of study)
consensus, part 3: Byzantine Fault Tolerances (online, est. 1 hrs of study)
Reading:
1. Impossibility of distributed consensus with one faulty process, Fischer, M. J.; Lynch, N. A.; Paterson, M. S., Journal of the ACM. 32 (2): 374–382, 1985.
2. Paxos Made Simple, L. Lamport, ACM SIGACT News 32(4), pp. 51-58, 2001.
2'. (alternative) The Part-Time Parliament , L. Lamport, ACM TOCS 16(2), pp. 133-169, 1998.
3. The Byzantine Generals Problem, Lamport, L.; Shostak, R.; Pease, M. , ACM Transactions on Programming Languages and Systems. 4 (3): 382–401, 1982. 
第14週
5/19  1. Consensus part Online Q&A
2. 期末專題提案報告(每組5分鐘) 
第15週
5/26  home study 
第16週
6/2  final exam 
第17週
6/9  Term Projects Demo